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Abstract 

From the 1990s, the multimodal turn in discourse studies makes multimodal discourse analysis a popular topic in 
linguistics and communication studies. An important approach to applying Systemic Functional Linguistics to 
non-verbal modes is Visual Grammar initially proposed by Kress and van Leeuwen (1996). Considering that 
commercial advertisement is an indispensable part of the modern society that bears rich meanings worth 
discussing, this paper analyzes visual components of the advertisement produced by Tmall for the Double 
Eleven Shopping Carnival from the perspective of Visual Grammar. By analyzing representational, interactive 
and compositional meaning presented in the advertisement, this article illustrates how visual components serve 
as a huge attraction to the viewers and effectively justifies the consumption behavior by appealing to the cultural 
and social state. It also sheds some light on raising the awareness of consumers by presenting how advertisement 
producers practice psychological manipulation on the viewers. 

Keywords: visual grammar, multimodal discourse analysis, the Double Eleven Shopping Carnival, commercial 
advertisement, multimodal literacy 

1. Introduction 

Chinese’s interest in festivals stretches through the ages, but changes also take place in terms of celebration over 
time. For many people, consumption has become the central feature of modern festivals, and the irreversible 
trend gave rise to the appearance of man-made festivals. For example, Foodie’s Day was created by merchants in 
May 17 th , because 517 resembles the sound of “I want to eat” in Chinese; May 20 th was made a day for couples 
for 520 resembles “I love you” in Chinese. The most famous example, however, is the Double Eleven Shopping 
Carnival. Molded by Tmall, November 11 th was initially made a day for singles in 2009. In the first year, its sales 
volume reached 5 million RMB (AdMaster, 2015), which proved to be a huge success. The following years has 
witnessed the steady increase in the sales volume, and Double Eleven Shopping Carnival became the most 
famous man-made online shopping festival in China. 

Under the circumstance, the reason behind Double Eleven’s popularity is worth researching. Although it started 
as a shopping carnival, now it has become a part of our popular culture with a total sales volume of over 91 
billion RMB in 2015’s event and a participation rate up to 79% among Netizens in 2014 (Ipsos, 2013). 

For many years, text analysis or discourse analysis in linguistic field has been the central focus, and language has 
been an exclusive interest for research. Profound studies in the field of multimodal analysis have been focusing 
on some of the static texts like newspapers and magazines (Zhang, 2007), print advertisements and posters (Yu, 
2013), textbooks, and dynamic ones like Public Service Advertisement (Wang, 2012; Qian, 2014), forensic texts 
(Guo, 2014) and movies (Luo, 2010). However, little attention has been paid to the analysis of commercial 
advertisement’s structuring and how it resonates powerfully with consumer thus trigger consumption. 

As the embodiment of the spirit of Double Eleven, the commercial advertisement produced by Tmall makes a 
good subject for the study. This paper employs the Tmall Double Eleven advertisement launched in 2014, 
discussing the cultural background and marketing motives of Tmall’s advertisement in Double Eleven, and hopes 
to raise more attention on multimodal study. 

This study aims to address the following two questions: 

a. How does the advertisement appeal to and resonant with the consumers through multimodal construction? 
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b. How does the advertisement justify the man-made festival through visual presentation? 

2. Literature Review 

2.1 Multimodal Discourse Analysis 

The earliest scholar studying multimodal discourse is R. Barthes, who discussed the interaction of language and 
image in expressing meanings in Rhetoric of the Image (1977). In 1990s, the field of study raised more concern 
in academia. In the prosperity of multimodal discourse analysis, two research approaches are widely applied, 
namely multimodal metaphor approach and systemic functional linguistic (SFL) approach. Multimodal metaphor 
approach, represented by Forceville (1996), studies multimodal discourse from a cognitive perspective. The 
representative of SFL is Halliday, who interpreted the functional relationship between language and social 
structure, worked on the definition of linguistics and set the theoretical groundwork of SFL in his book An 
Introduction to Functional Grammar in 1994. Based on his contribution in functional linguistic theories, 
O’Toole (1994) and Kress and van Leeuwen (1996) laid the landmark of multimodal discourse analysis (MDA). 
Apart from these two sociological theorists, other scholars including Martinec (2000) and O’Halloran (2004) 
have also made theoretical contribution at the early stage of development of multimodal discourse analysis from 
a systemic functional perspective. 

In recent years, more theorists started doing interdisciplinary studies which put more emphasis the integration of 
different research methods in their analysis instead of merely discussing social semiotics (Forceville & 
Urios-Aparisi, 2009; Feng & Jing, 2011). Digital technology was borrowed for the annotation and analogue of 
complex multimodal discourse analysis (Lim, 2011; O’Halloran, et al., 2012). Multimodal corpus was 
established and multimodal retrieval software was built for further analysis (Baldry & Thibault, 2008; Gu, 2006, 
2009). 

Apart from the variation of research methods, subjects for research are also expanding, such as 
three-dimensional space (Stenglin, 2011; Ventola, 2011), paralinguistic features (Hood, 2011; Knight, 2011), 
picture and comic books (Feng & O’Halloran, 2012); Painter, Martin, and Unsworth (2013) and situated 
discourse (Gu, 2006, 2009). From the previous studies, it can be concluded that the most innovative progress was 
made on the multimodal analysis of movies and new media. However, the existing multimodal theories are still 
in need of improvement, for example, insufficiency visual grammar and multimodal metaphor in terms of 
theoretical framework. Also, the trend of multimodal analysis is going beyond intradisciplinary study, and 
whether it is moving toward interdisciplinary or transdisciplinary is still unpredictable, requiring more research 
and discussion. 

2.2 Visual Grammar 

Proposed by Kress and van Leeuwen in 1996, the framework of Visual Grammar is based on the contribution of 
the former semiotic school’s findings which are originally used to illustrate linguistic texts. Halliday sees 
language as a semiotic mode which represents three metafunctions: the ideational metafunction, the interpersonal 
metafunction and the textual metafunction. Based on Halliday’s theory, Kress and van Leeuwen use different 
terms for the same subjects: representational instead of ideational; interactive instead of interpersonal; and 
compositional instead of textual. 

The representational meaning deals with the way images represent the relations between represented participants 
in the picture. It is divided into narrative representation and conceptual representation, and narrative 
representation can be further categorized into action process, reaction process and speech and mental process, 
which referred to the six processes in Halliday’s Transitivity system. Action process is similar to material process 
in SFL, and reaction process and verbal and mental process to verbal and mental process. While narrative 
representation presents unfolding actions and events, processes of change, transitory spatial arrangements, 
conceptual representation represents participants in terms of their more generalized and more or less stable and 
timeless essence, in terms of class, or structure or meaning (Kress & van Leeuwen, 2006, p. 79). Conceptual 
representation is divided into classificational processes, analytical process and symbolic process, with the former 
two resembling relational process in SFL. 

The interactive meaning is mainly about the social relations between interactants and the evaluative orientations 
that participants adopt towards each other and to the world represented by the text. Its realization relies on four 
elements, namely contact, social distance, attitude and modality. Contact, which discusses the demand or offer 
relation between represented participant and viewer, conveys meaning through ‘demands’ and ‘offers’, which is 
related to Halliday’s description of four ‘speech acts’. Social distance represents social relations between the 
producer, the viewer and the represented participant, which is realized by size of frame. Attitude, which classifies 
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images into subjective and objective ones, is deemed naturally neutral yet altered by angles. Modality, which 
comes from linguistics and refers to the truth value or credibility of (linguistically realized) statements about the 
world (Kress & van Leeuwen, 2006, p. 155), explores the role of modality markers including color saturation, 
color differentiation, color modulation, etc. 

The compositional meaning relates the representational and interactive meanings of the image to each other 
through three interrelated systems (Kress & van Leeuwen, 2006, p. 177), the three principles of composition are 
information value, salience and framing. Information value studies the placement of elements and their values 
attached accordingly, with the left and right arrangement represents given and new information, top and bottom 
arrangement resembles ideal and real information, center and margin arrangement is also discussed in respect of 
culture context. The construction also referred to Halliday’s viewpoint that the combination of known and new 
information is the most common information structure. Salience refers to how a hierarchy of importance among 
the elements is created through variations in size, shaipness of focus, tonal contrast, etc. Framing discusses the 
relationship between the degree of connectedness and the significance of individuality or differentiation. The 
latter two components of the compositional meaning provide practical insights on the interpretation of images, 
however, the theory does not elude itself from over-interpretation or misinteipretation of images considering the 
possible subjectivity of the researcher or his/her unfamiliarity with unusual attribution bearing decisive 
meanings. 

Recent years have witnessed new development of visual grammar in theoretical refinement (Painter, Martin & 
Unsworth, 2013) and attempt in interdisciplinary discussion (Feng & Jing, 2011) and renewed research 
perspective (Bateman, 2014). Flowever, we are still in the trial stage where the quality of output from the 
exploration is still in need of refinement. 

2.3 Previous Studies on Advertising 

Advertising discourse produces social meanings and symbolic value through various channels and influences 
social-culture and social relationships (Yang, 2007). In the field of study, traditional analysis focused on the 
lexical, grammatical, stylistic, rhetorical features of language and its social functions (Leech, 1966; Williamson, 
1978; Fluang, 2001; Zhang, 2007; Flu, 2007a & 2007b). Leech (1966) is the pioneer in advertising research, he 
investigates personal, imperative, passive voice and other linguistic features in language. Barthes (1977) and 
Williamson (1978) are the representatives of scholars who analyze advertising discourse from the semiotic 
perspective, Barthes introduced the relationship between images and information in advertisements, and 
Williamson thinks that advertisements analysis is not only a process of coding and decoding (Flan, 2011). 
Vestergaard and Schroder (1985) explained the social motivation in advertising discourse from the pragmatic 
perspective. Geis (1982) studied television commercial advertisements with conversation implication and 
cooperative principles and found that advertising language is persuasive and spreading. Cook (1992) discussed 
the interactive function of text, music, picture and participants in advertising. Fluang (2001) made use of 
systemic functional grammar to study advertising discourse. 

Research on online shopping festival has been a hot topic in recent years, Double Eleven Shopping Carnival, as 
the most successful one, is discussed most frequently. Previous studies can be divided into marketing analysis 
(Feng, 2012; Liu, 2013) and statistical research (Ipsos, 2013), which focused more on broad tactical analysis and 
assessment, the cultural and social background contributed to Tmall’s success was rarely mentioned. Thus, this 
study combines the less discovered aspects by conducting multimodal analysis on the advertisement Tmall 
produced for Double Eleven, hopefully, the paper would inspire new thoughts for more original insights and a 
broader perspective on the study of multimodal analysis and the study of advertisement. 

3. Tmall’s Double Eleven Advertisement: A Multimodal Discourse Analysis 

The core objective of advertising is to specify and substantiate the value of product or service to its potential 
buyers. In order to present and transmit as much effective information to the consumers as possible in a society 
where visual culture has become increasingly important, commercial advertisement has evolved from 
single-mode to multimodal expression in advertising. 

This part mainly discusses the visual components of the Tmall advertisement for the purpose of having a clearer 
understanding of how advertisement appeal to and resonant with the consumer through multimodal construction 
while propagating its value. In the ad, four scenes are presented, depicting female friends gathering, a man 
crashing a piggy bank with a wrecking ball, a woman and a wall with shaped holes and a hardworking man 
respectively. In order to be clearer, screenshots from the ad are presented accordingly. 
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3.1 Representational Meaning 

According to Kress and van Leeuwen, conceptual patterns represent participants in terms of their class, structure 
or meaning, in other words, in terms of their generalized and more or less stable and timeless essence, narrative 
patterns serve to present unfolding actions and events, processes of change, transitory spatial arrangements. The 
hallmark of a narrative visual ‘proposition’ is the presence of a vector: narrative structures always have one, 
conceptual structures never do (Kress & van Leeuwen, 2006: 59). 

(1) Narrative Processes 

Narrative process is classified into three sub-processes: action processes, reaction processes, and verbal and 
mental processes. 

In action processes, the Actor is the participant from which the vector emanates. When images or diagrams have 
only one participant, this participant is usually an Actor, the resulting structure is non-transactional process; 
when there are two participants respectively serving as the Actor and the Goal, we call it transactional process. 
The Actor, in whole or in part, forms the vector, which can be analyzed through size, place in the composition, 
contrast against background, color saturation or conspicuousness, sharpness of focus, and through the 
‘psychological salience’ which certain participants have for viewers (Kress & van Leeuwen, 2006, p. 63). In the 
Tmall ad, transactional processes overweigh non-transactional ones, especially when it comes to the construction 
of the highlights in each scene where salient Actors are shown specifically ‘aimed at’ the Goal, creating a 
concrete context to the viewer who, in this case, is saved the trouble of needing to ponder over the profound 
meanings behind. However, non-transactional processes do exist, and the occasionally appearance is not 
randomly arranged or edited, on the contrary, the process usually appear at the early stage of a scene with the 
gesture of represented participant forming a vector without specific Goal in the frame, forming an open-ended 
situation for consideration, and leaving the viewer imagining the psychological activities of the Actor, in other 
words, relating to the image, thus creating a sense of empathy or identification. 

In reactional processes, the vector is formed by an eye line, the direction of the glance of one or more of the 
represented participants (Kress & van Leeuwen, 2006, p. 67). The Reactor, the participant who does the looking, 
are humans in the Tmall ad, they convey the spirit of the holiday mainly through their facial expressions and 
direction of glances, leading the viewer to see what they see, feel the way they feel and believe what they believe. 
The Phenomena may be formed either by another participant or a visual proposition, and when there is no 
Phenomenon, reactions are non-transactional. In Tmall advertisement’s case, the Phenomena is formed by the 
piggy bank, the wall with shaped holes, wall of piles of files, etc., resembling the release of stress, fulfilling of 
mental void and shattering of pressure from work respectively, which are all realized by purchasing behaviors. 
The function of the Phenomena is rather important for the advertisement in that it forms the transactional 
reaction, which is quite prominent in the recreational processes and presents the concrete and 
easy-to-comprehend imagery to the viewer to enhance the validity and legitimacy of the man-made festival. 

In verbal and mental processes, a special kind of vector, in the forms of thought balloons and dialogue balloons 
that connect drawings of speakers or thinkers to their speech or thought, is discussed for its appearance in comic 
strips. However, the vector does not appear in the ad, thus the discussion of verbal and mental process is omitted 
from the analysis. 

(2) Conceptual Processes 

The conceptual representation is comprised of classificational processes, analytical processes and symbolic 
processes. The former two study the relationship between represented participants, which resembles the 
relational process in SFL, marked by whether there exists superordinate apart from subordinate, classificational 
processes are categorized into Covert Taxonomy and Overt Taxonomy. Analytical processes mainly discuss the 
part-whole structure, involving two kinds of participants, namely Carrier (the whole) and Possessive Attributes 
(the parts). The symbolic processes, which discuss the meaning of participants, can be divided into Symbolic 
Suggestive and Symbolic Attributive depending on whether there exist one or two participants respectively. 
Symbolic process is ubiquitous in the Tmall ad and Symbolic Attributive appears more frequently. 
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Figure 1. Wall with shaped holes 


Take one scene as a representative example, in the ad, a high wall with shaped holes (figure 1), which resemble 
the shape of dresses, bags, shoes, dolls, furniture, etc., is presented. The wall itself, as a represented participant, 
is made salient through its size in the picture and eye-catching color, and the holes are visualized source of 
mental void. With the red-dressed woman standing before the wall displaying a sense of yearning and fulfillment 
after the void is fulfilled, the ad equivalent mental void with unsatisfied material needs and provides a solution 
for achieving fulfillment through the quickest way—consumption. 

3.2 Interactive Meaning 

The interactive dimension of images is the “writing” of what is usually called “non-verbal communication”, a 
“language” shared by producers and viewers alike. It involves two kinds of participants, represented participants 
(the people, the places and things depicted in images) and interactive participants (the people who communicate 
with each other through images, the producers and viewers of images), and three kinds of relations: (1) relations 
between represented participants; (2) relations between interactive and represented participants (the interactive 
participants’ attitudes towards the represented participants); and (3) relations between interactive participants 
(the things interactive participants do to or for each other through images) (Kress & van Leeuwn, 2006, p.l 14). 
The interactive meaning is realized by four factors: contact, social distance, attitude and modality. 

(1) Contact 

The visual configuration has two related functions. On the one hand, an image may create a visual form of direct 
address, in which contact is established by direct eye gaze or gestures of represented participants, in this case, we 
call this kind of image a ‘demand’; on the other hand, an image may also address the viewer indirectly , in this 
case, it ‘offers’ the represented participants to the viewer as items of information, objects of contemplation, 
impersonally, as though they were specimens in a display case (Kress & van Leeuwen, 2006, p. 119). 

In some contexts—for instance, television newsreading and the posted magazine photograph—the ‘demand’ 
picture is preferred: these contexts require a sense of connection between the viewers and the authority figures, 
celebrities and role models they depict. In other contexts—for example, feature film and television drama and 
scientific illustration—the ‘offer’ is preferred: here a real or imaginary barrier is erected between the represented 
participants and the viewers, a sense of disengagement, in which the viewer must have illusion that the 
represented participants do not know they are being looked at, and in which the represented participants must 
pretend that they are not being watched (Kress & van Leeuwen, 2006, p. 120). 

In the Tmall ad for Double Eleven, ‘offer’ images are presented. The represented participants, women and men, 
do not create direct eye contact with viewers, however, their facial expression and conduct display the happiness 
and enjoyment gained from the man-made festival. The intangible yet undeniable link between happiness and 
consumption is strongly indicated to the viewers. What the ad offers for its viewers is the sense of fulfillment 
people feel after buying the commodities they wanted, which insinuates that the viewers can also feel satisfied 
and live a happier life mimicking what the represented participants do, in other words, buying commodities from 
Tmall. 

(2) Social Distance 

The choice of distance can suggest different relations between represented participants and viewers, the 
construction of image itself can make the viewer feel close or far away from the represented participant. In our 
daily interactive activities, regulation imposed on social relation and interaction determines the distance we 
should keep from each other and influences the way we interact with each other. 

In advertisement, size of frame is invariably defined in relation to the human body. According to Kress and van 
Leeuwen, the close shot (close personal distance) shows head and shoulders of the subject, and the very close 
shot (intimate distance) anything less than that. The medium close shot (far personal distance) cuts off the 
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subject approximately at the waist, the medium shot (close social distance) approximately at the knees. The 
medium long shot (far social distance) shows the full figure. In the long shot (public distance) the human figure 
occupies about half the height of the frame, and the very long shot is anything ‘wider’ than that (Kress & van 
Leeuwen, 2006, pp. 124-125). In conclusion, the distance of depicted participants from the viewer is one of the 
material realizations of social interaction in a given context. 

In the ad, images showing the head and shoulders and the whole figure with space around it appear the most 
frequently, accompanied by occasional long shot. The rare showing of intimate distance and detachment is by no 
means a coincidence. In some ads, in order to enhance the feeling of involvement and intimacy, a close shot is 
employed. In other cases, a long shot is employed, and although the represented participants are looking at the 
viewers, the distance weakens the impact on us. 

In the Tmall ad, however, not all scenes are shot at close distance, instead, medium shot and medium long shot 
are used more frequently along with occasional appearance of other sizes of frame. Beside, flexible camera 
movement between close and long shot appears frequently. Several reasons can be concluded based on the aim 
and feature of the ad: (1) the ad attempts to form the atmosphere of joyfulness by sketching in the details of 
every scene, in this case, the ad does not only try to capture the revealing of individual feelings, but more 
importantly, heighten the over-all joyous and appealing atmosphere of the very special day, apart from building 
up and spreading the spirit, as a commercial, it is also important to trigger acquisition, thus (2) the delicate 
distance between the represented participants and viewer in most scenes, to a certain extent, prevents the viewers 
from immersing themselves fully in the scene the ad created as the viewers are watching at a far social distance, 
in other words, the distance of the ‘stranger’. The fact that the image brought by the passionate ad alone cannot 
fulfill the viewers’ craving for fulfillment subconsciously trigger the buying behavior. It is also important that the 
distance is not so far as to create an unreachable sense for the audience, hence the long shot, which indicates 
public distance is rarely used. Thus, the images in the ad create a subtle distance, and in order to fill the gap, it is 
insinuated that the audience would need to make purchases; and (3) the frequent camera movement between 
close and long shot also plays a role in triggering consumption. In the ad, the movement of camera either zooms 
in from a long shot to close shot, or zoom out from presenting a scene showing several represented participants 
to the face or head of a specific represented participant. 

(3) Attitude 

Attitude discusses the relations between represented participants and the viewer, and the system of perspective 
realizes “attitude”, which is often socially determined, is usually realized by the selection of the angle. The 
selection of an angle, a ‘point of view’, implies the possibility of expressing subjective attitudes towards 
represented participants, human or otherwise (Kress & van Leeuwen, 2006, p. 129). 

Horizontal angle forms either a frontal or an oblique point of view, which respectively signifies different degree 
of involvement or detachment. Represented participant shown by frontal angle indicates that special attention 
need to be paid to him or her, participants shown by oblique angle, on the other hand, usually suggests otherwise. 
In the Tmall ad, frontal angle is widely used in the construction of images so that the audience has a greater 
feeling of being involved into the holiday atmosphere. However, there is one exception, in the scene depicting a 
wall of shaped holes filled with commodities (Figure 2), the represented participant is given a back view. This 
scene suggests us to pay more attention to the wall, meanwhile the woman does not only represent an individual 
who has specific identities, she become ‘us’, viewers of the ad. 
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Figure 2. Wall filled with commodities 


Vertical perspective is realized by camera height and signifies different degree of power. A high angle, it is said, 
makes the subject look small and insignificant, with the tendency of diminishing the individual; on the other 
hand, a low angle makes it looks imposing and awesome, giving an impression of superiority, exaltation and 
triumph (Kress & van Leeuwen, 2006, p. 140), and eye-level angle indicates an equal relationship and gives out 
a sense of neutral documentation. 

in the Tmall ad, the eye-level shot is used generally to suggest the normality of being happy with material 
sufficiency. One scene worth paying special attention to, however, is the one that depicts a hard-working man 
surrounded by piles of documents from a high angle (Figure 3), which adds to the feeling of exhaustion from the 
heavy workload. The following scene witnesses a drastic change in his over-all state from a low angle, showing 
his smiling and relieved face from a medium close shot, his figure accordingly becomes more significant, and the 
mood of the very scene changes into a light-hearted one as Double Eleven comes. The indication that the 
man-made festival can save people from heavy workload is presented through the carefully constructed 
perspective. 

instead of using high or low angle, eye-level angle is used in presenting the image, which resembles equality. 
Shooting from a physically higher angle and look down on the subject usually makes the subject looks small or 
weak whereas a low-angle shot taken from below makes the subject looks more powerful and spectacular. A 
neutral shot or eye-level angle, however, attaches little psychological effect on the viewer with the camera looks 
straight on with the subject. 



Figure 3. Man surrounded by document piles 


(4) Modality 

The term ‘modality’ comes from linguistics and refers to the truth value or credibility of (linguistic realized) 
statements about the world (Kress & van Leeuwen, 2006, p. 155), and the specific degrees of modality are 
measured by modality markers including color saturation, differentiation and modulation, contextualization, 
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representation, depth, illumination and brightness. In the Tmall ad, full color saturation, high diversified range of 
colors and fully modulated color are prominent, displaying articulated and detailed background, low degree of 
abstraction and high brightness value. By presenting an ad with the features above, a light-hearted and cheerful 
atmosphere is presented, echoing with the theme of the festival. The detailed depiction, along with a sense of 
approval created by the positive atmosphere, helps to justify willfully consumption on this very day. 

3.3 Compositional Meaning 

Composition relates the representational and interactive meaning of the image to each other through three 
interrelated systems (Kress & van Leeuwen, 2006, p. 177): information value, salience and framing. 

(1) Information Value 

The placement of elements (participants and syntagms that relate them to each other and to the viewer) endows 
them with the specific informational values attached to the various ‘zones’ of the image: left and right, top and 
bottom, centre and margin (Kress & van Leeuwen, 2006, p. 177). The left to right composition shows the 
information from given to new, top to bottom shows the information from ideal to real, and centre and margin 
composition signifies the greater emphasis on the central part of images. 

In the Tmall ad, most scenes follow the centre and margin construction in which the represented participants are 
usually the highlighted part in images, reemphasizing the role of general public as the central point of the festival. 
The central composition is relatively uncommon in contemporary Western visualization, though. According to 
Kress and van Leeuwen, perhaps it is the greater emphasis on hierarchy, harmony and continuity in Confucian 
thinking that makes centring a fundamental organizational principle in the visual semiotic of Asian culture 
(Kress & van Leeuwen, 2006, p. 195). 



Figure 4. Wine glass 


Apart from that, at the beginning of the ad, it shows a cherry dropping into a glass of wine (Figure 4) right after 
showing a happy woman falls from mid-air. Goatly (2007, pp. 36-37) stated that “high is also a metaphor for 
power and dominance, as in high places, high handed”, and “the symbolism of height as power is especially 
noticeable in the penchant for tall buildings”. The scene of a woman falling from mid-air is a typical downward 
movement, so does the dropping cherry, the vertical movement from top to bottom indicates the change of state 
from being ideal and unreachable to real and accessible. Unlike automobile advertisement, in which cars often 
display upward movements indicating the outstanding mechanical performance as well as the driver’s capability 
of moving upward in the society (Feng & Xing, 2011, p. 60), the Tmall ad aims to highlight accessibility. Along 
With the voice-over saying “On this day, I’ll taste the sweetness of losing control”, the glass of wine can be seen 
as the routine life before the shopping carnival, and the cherry, resembles the breaking of routine, can be viewed 
as a sweet, surprising gift worth embracing and made reachable through the festival. The ad hereby justifies the 
day by comparing the shopping carnival to a sweet surprise. 

(2) Salience 

The elements (participants as well as representational land interactive syntagms) are made to attract the viewer’s 
attention to different degrees, as realized by such factors as placement in the foreground or background, relative 
size, contrasts in total value (or color), differences in sharpness, etc. (Kress & van Leeuwen, 2006, p.177). Based 
on their thought, Machin (2007, p. 130) states that “Salience is where certain features in composition are made to 
stand out to draw the viewer’s attention”. 

One representative scene is Figure 1. In the screenshot, we can see that although the woman in red dress is in the 
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foreground, the wall in the background, featured by the golden light filtered in through the shaped holes in the 
wall, is the more salient composition, the salience and eye-catching golden color highlight the commodities and 
glorifies its importance. 



Figure 5. Man riding a wrecking ball 


In one scene, the ad shows a piggy bank and a man riding a wrecking ball (Figure 5). The color contrast between 
the cool white and garish red is impactive: strongly saturated red dots all over the room creates an intensive and 
anxious atmosphere, which echoes with the stressful life. In this way, keeping bank saving is depicted as being 
equivalent to the root cause of anxiety. Piggy bank, which has been used for a long time as a money-saving 
suggestion, is crashed into by a silver wrecking ball rode by a young man. The gesture is depicted in a way 
which makes it looks like a light-hearted game, insinuating that spending money on this very day can ease the 
anxiety brought by the social norm which encourages frugality, thus encourages the viewers to do the same 
thing—to take out their money in the bank and buy themselves some pleasure. The color contrast contributes 
greatly to the accomplishment of the construction of the hidden message. 

(3) Framing 

The presence or absence of framing devices (realized by elements which create dividing lines, or by actual frame 
lines) disconnects or connects elements of the image, signifying that they belong or do not belong together in 
some sense (Kress & van Leeuwen, 2006, p. 177). The more the elements of the spatial composition are 
connected, the more they are presented as belonging together, as a single unit of information (Kress & van 
Leeuwen, 2006, pp. 203-204). Connectedness is the mainstream feature in terms of multimodal construction in 
the ad, nevertheless, there is one exception. In Figure 6, the image is cut into two parts featured by the 
discontinuous color of the double door and the space in between. The scene does not last long, but it is critical 
and plays an important role in the ad. In the scene the lady running across the closing door creates a sense of 
tension for the door might close before she enters, which insinuates the fact that being a part of the shopping 
carnival is an event worth going for, and since the festival only lasts for one day, it is advised that viewers need 
to take the seemingly precious chance. In the scene, although the lady is at the centre of the image, the focal 
point is on the closing door, heightening the diminishing chance. At this point, he vague figure of the represented 
participant suggests that she no longer represent a specific individual, but the general public. 



Figure 6. Woman running through a door 
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4. Discussion 

The aim of this section is to discuss the previous findings in the analysis and examine how the advertisement 
successfully applied AIDA, the advertising principles, through multimodal construction, advertising patterns 
brought up by Cook (2001) are also included and discussed in this section. 

According to Lewis (1903), the mission of an advertisement is to attract a reader, so that he will look at the 
advertisement and start to read it; then to interest him, so that he will continue to read it; then to convince him, so 
that when he has read it he will believe it. Later on, the formula was further developed and widely used in 
marketing analysis. In Habermas’s theory of Communicative Action, this type of business action belongs to the 
strategic action in which the advertiser appeal to the desires of the consumer so as to motivate consumption that 
is required for advertiser’s success. Cook (2001, p. 232) puts it with commoner expression by stating that 
advertising attracts receivers’ attention, then trick them into buying the product, appealing to greed, vanity, lust 
and fear, and suggesting that purchase will make the receiver like the people portrayed. 

(1) Attention 

As Crawford (2015) puts it, “Attention is a resource—a person has only so much of it”. When it comes to 
advertising, it is even harder to get attention since ads, for many people, are either not at the centre of attention 
or do not hold attention for long (Cook, 2001, p. 222). 

Advertising does not only provide sources of information, but more importantly, employs verified methods for 
persuasion to grasp the consumer’s attention. Three ways can be employed (Suggett, 2014), namely location, 
shocking factor and personalization. In Tmall’s case, emphasis on personalization is the most salient feature. 

a. Represented participants 

There are six represented participants, four women and two men, in total, all between the age of 20 to 30. The 
directivity in age implies that audience and major consumers are the younger generation who are relatively more 
proactive in terms of online shopping. The fact that female outnumbers male further indicates that women are 
more of the main target of the ad. Another feature is that no celebrities are involved in any of the scene, instead, 
regular-looking people are the leading figures, the advertisement aims to shorten the social distance with the 
general public by presenting participants similar to average people, in this way, it appeals more accessible for the 
viewers to live a similar happy life, also, it is easier for the viewers to relate to the scenes. By bring ordinary 
people in the limelight, the ad tries to convey its spirit that Doble Eleven is a festival for the general public. 

b. Specified context 

As an ad which encourages shopping, Tmall did not attract the viewers by presenting manifold commodities it 
can provide, nevertheless, the ad constructs four different scenes with their themes close to the daily life of its 
audience. 

In the ad, four scenes depicting female friends gathering, a man crashing a piggy bank with a wrecking ball, a 
woman and a wall with shaped holes and a hardworking man surrounded by document files respectively are 
presented. While friends gathering and hard-working employees are more commonplace, the man riding a 
wrecking ball and woman filling a wall with shaped holes with commodities may seem less likely to happen in 
daily life, however, it is the themes conveyed through the ad that are common. By crashing the piggy bank, the 
represented participant of the very scene visually practices what many viewers would not dare to do in daily 
life—purchase commodities without overly worrying about the bank account, by presenting the scene, the ad 
offers the audience a day in which the guilt of buying without planning can be exempted. The wall with shaped 
whole is another metaphor which equivalents the mental void with insufficiency in material requisites, in a 
society where loneliness and void brought by fast-paced modern life has become increasingly overwhelming, the 
ad visualizes and directs at the specific context. 

(2) Interest 

To increase and maintain viewers’ attention, ‘Unrealism’ is sometimes used in describing a bland and 
problem-free world: the families are happy; the days are sunny; the meals tasty; the Christmases snowy; the 
grannies kind; the roads uncongested; the countryside unspoiled; the farming traditional (Cook, 2001, p. 224). 
Nonetheless, the feature of being selective and avoiding problems is not overriding in the Tmall ad. In four 
scenes depicted in the ad, two deeds normally deemed as unconventional and prevailing phenomenon are 
addressed. 

Losing control and spending money as we wish without planning ahead are often deemed as inappropriate, 
however, the ad presents the inappropriateness in a positive and joyous atmosphere. In the first scene, the guilty 
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pleasure of enjoying the feeling of losing control with friends are presented light-heartedly, and in the second 
scene, casual consumption is depicted as a thrilling game. The prominent feature heightening the playful scenes 
is the use of color. In modern life, color plays important role in expressing meaning. For instance, in traffic light 
red is used on warning signs, and green on safe signs. In hospitals, the room is painted white for the purpose of 
calming patients down (Zhou, 2012, p. 42). The definition of modality in visual analysis refers to the usage 
degree of the certain image representing devices (color, represented detail, depth, tone, etc.) (Norris, 2004, p. 
256). The Tmall advertisement features heavily with bright, warm colors, the full palette of diverse colors 
renders a cheerful and positive feeling. Specifically, the viewer can connect the bright colors with happiness, 
hope and cheerfulness, and the happier viewers become, the more likely they are going to complete the purchase 
procedure. The artistic expression of hyperbole creates a sense of resonance for the viewers. 

Instead of presenting an abstract idea of “consumption makes people happy”, the articulation of detail is finely 
grained in the ad. Through detailed depiction, the ad tries to equal sufficient material wellbeing with happiness 
and fulfillment. To make the abstract idea of happiness more concrete, images in the ad presents details to their 
fullest. 

The prevailing phenomenon of modern society presented is also a key factor which maintains the audiences’ 
attention. In the third scene, a young female who choose to fill the mental void with commodities is portrayed. 
Mental void has become a commonplace issue that individuals need to deal with on their own in modern society. 
In the ad, the void is visualized as a wall with wholes, and by filling them with products, the void is filled as well. 
The visualization of filling our void with accessible products which can be easily bought from Tmall presents a 
soothing way of dealing with the problem. In the last scene, a man surrounded by paperwork is depicted from a 
high angle, the presentation of a commonly seen situation, the ad raises empathy among viewers since a large 
proportion of the targeted audience are employees whose workload can also be excessively heavy. By relating to 
the viewers, interest is maintained. 

(3) Desire and Action 

After successfully gained attention and maintained viewers’ attention accordingly, it is necessary to transform the 
thought into desire for the product or service. By showing four different yet specific situations, the desires of the 
represented participants are light-heartedly presented and satisfactorily met, the persuasiveness and accessibility 
justify the reasonableness of the man-made festival. 

Though shot from staged scenes, the scenes reflect realistic problems which relate to its viewer. Seeing how the 
men and women enjoyed themselves and solved their problems, there is no need to shout slogans to win the 
audience’s heart since the ad speaks for itself well. By showing that the festival is a day for the general public, 
the ad creates attention; by presenting down-to-earth scenarios in an acceptable approach, interest is maintained; 
the specification of scenes presented and the cheerfulness in atmosphere relate successfully to the viewers and 
offers a quick solution—consumption, the ad arises interests and creates desire, which could very likely turn into 
consumption, and even if it does not, the spirit and essence of Double Eleven has ran deep into the consumers’ 
cognition toward the day, which to a certain degree, has become part of the consumer culture. 

5. Conclusion 

In the Tmall Double Eleven advertisement, the multimodal construction successfully conveys and realizes the 
purpose of the ad—promoting the festival culture and trigger consumption. Based on the previous analysis, the 
research questions are hereby answered. 

The main tactic employed to appeal and resonant with the consumer is to create a delicate balance between 
immersion and distance through multimodal construction. The use of bright color presents a vivid and positive 
image to the viewers, by creating a warm and relaxing phenomenon for the consumers. Images in the ad convert 
between short and medium social distance, accompanied by occasional long shot. The feature aims to create a 
situation which is not too immersive to the viewers but also accessible, thus trigger the behavior of consumption. 
Also, the avoidance of direct contact establishes a strong link between happiness and consumption. 

As for the justification of material consumption, the advertisement does not bluntly display its products or 
services for sale, instead, the ad caters to the common insufficiencies in our society and provides solutions, thus 
making the festival a day for people to fix problems and enjoy themselves. Consumption is necessary at times, 
yet also viewed as something need to be done with rationality. However, the advertisement dispels the 
discomfort by placing the the shopping carnival as a means of letting go. It is not only about letting go of tightly 
saved money, but also the uptight nerves, accumulated pressure, and the despairs aroused by emotional 
blankness. By echoing to the social need, the advertisement serves as more of a spiritual release other than a 
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mere promotion. 

With multimodal components playing important roles in the justification of Double Eleven Shopping Carnival, 
the ad reflects the spirit and motivation of the carnival, as well as current social state. By revealing the successful 
tactics employed by Tmall in the ad for promotion, the analysis not only provides another perspective for 
multimodal discourse analysis, but also serves as a reference for other enterprises with intention to renew and 
expand its social influence, apart from that, the writer also hopes to help raising the cautious of consumers by 
presenting how advertisement producers practice psychological manipulates on the general public. 

The analysis also hopes to shed light on the cultivating student’s ability in multimodal literacy, therefore forming 
a better understanding and more critical perspective in detecting and analyzing the underlying purpose of 
advertisement, which is prominent in today’s society. 

In spite of the findings listed above, several deficiencies are in need of discussion. First, the influence of culture 
affects the applicability and authenticity of visual grammar. As Kress and van Leeuwen says, visual grammar is 
“quite a general grammar of contemporary visual design in ‘Western’ culture” (Kress & van Leeuwen, 2006, p.3). 
The irreversible trend of globalization brought about better understanding and communication among cultures, 
however, the differences are not to be overlooked. Similarity, it would be negligent not to take cultural difference 
into consideration when borrowing foreign theories in domestic study. Although this study tries to take Chinese 
societal status quo into consideration under the framework brought up by Kress and van Leeuwen, a more 
comprehensive explanation on Chinese culture and factors influencing domestic consumer’s current buying 
behavior can be taken into consideration to better adjust to the very specific context. Apart from that, in the 
discussion of compositional meaning, size and placement are two of the decisive factors in determining the 
degree of importance. However, the attribution in different society may vary greatly, and the balance between 
generalization and specification still need refinement. 

Second, the inherent insufficiencies of visual grammar also need improvement. For example, the function of 
color is deemed significant in visual grammar and discussed under different meaning. The complexity and 
possible overlap in discussion and elaboration might be averted in the future with modification in structure. The 
subjectivity in the process of analysis is also a problem in need of discussion. For example, in the determining 
the attitude of images, the degree of objectivity formed by different angles can be altered and influenced by other 
prominent factors in the very same image. 

Third, though the advertisement produced by Tmall serves as a good example in terms of academic research, it is 
also possible to expand the research object for a more comprehensive study. And due to the complexity of the 
nature of video advertisement, it is unlikely to analyze every frame in detail, thus the most critical ones which 
are also less likely to cause bias and confusion are picked for discussion. 

Last, due to the limited personal academic capacity, the study only takes into consideration the visual grammar 
for analysis. More modes and related theories can be included for future research in the attempt to produce more 
valuable academic results. 
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Appendices 

Appendix 1. Link for Discourse analysis data 

Name: Tmall 2014 Double Eleven Advertising (0:00:30) 

Produced by: Tmall 
Date: 2014 

Source URL: https://www.youtube.com/watch?v=DwiVCflrMq4&feature=youtu.be 
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